Classification Using Proximity Catch Digraphs (Technical Report)
نویسندگان
چکیده
We employ random geometric digraphs to construct semi-parametric classifiers. These data-random digraphs are from parametrized random digraph families called proximity catch digraphs (PCDs). A related geometric digraph family, class cover catch digraph (CCCD), has been used to solve the class cover problem by using its approximate minimum dominating set. CCCDs showed relatively good performance in the classification of imbalanced data sets, and although CCCDs have a convenient construction in R, finding minimum dominating sets is NP-hard and its probabilistic behaviour is not mathematically tractable except for d = 1. On the other hand, a particular family of PCDs, called proportional-edge PCDs (PE-PCDs), has mathematical tractable minimum dominating sets in R; however their construction in higher dimensions may be computationally demanding. More specifically, we show that the classifiers based on PE-PCDs are prototype-based classifiers such that the exact minimum number of prototypes (equivalent to minimum dominating sets) are found in polynomial time on the number of observations. We construct two types of classifiers based on PE-PCDs. One is a family of hybrid classifiers depend on the location of the points of the training data set, and another type is a family of classifiers solely based on class covers. We assess the classification performance of our PE-PCD based classifiers by extensive Monte Carlo simulations, and compare them with that of other commonly used classifiers. We also show that, similar to CCCD classifiers, our classifiers are relatively better in classification in the presence of class imbalance.
منابع مشابه
A new family of proximity graphs: Class cover catch digraphs
Motivated by issues in machine learning and statistical pattern classification, we investigate a class cover problem (CCP) with an associated family of directed graphs—class cover catch digraphs (CCCDs). CCCDs are a special case of catch digraphs. Solving the underlying CCP is equivalent to finding a smallest cardinality dominating set for the associated CCCD, which in turn provides regularizat...
متن کاملThe Use of Domination Number of a Random Proximity Catch Digraph for Testing Spatial Patterns of Segregation and Association
Priebe et al. (2001) introduced the class cover catch digraphs and computed the distribution of the domination number of such digraphs for one dimensional data. In higher dimensions these calculations are extremely difficult due to the geometry of the proximity regions; and only upper-bounds are available. In this article, we introduce a new type of data-random proximity map and the associated ...
متن کاملA Probabilistic Characterization of Random Proximity Catch Digraphs and the Associated Tools
Proximity catch digraphs (PCDs) are based on proximity maps which yield proximity regions and are special types of proximity graphs. PCDs are based on the relative allocation of points from two or more classes in a region of interest and have applications in various fields. In this article, we provide auxiliary tools for and various characterizations of PCDs based on their probabilistic behavio...
متن کاملExtension of one-dimensional proximity regions to higher dimensions
Proximity maps and regions are defined based on the relative allocation of points from two or more classes in an area of interest and are used to construct random graphs called proximity catch digraphs (PCDs) which have applications in various fields. The simplest of such maps is the spherical proximity map which maps a point from the class of interest to a disk centered at the same point with ...
متن کاملRelative density of the random r-factor proximity catch digraph for testing spatial patterns of segregation and association
Statistical pattern classification methods based on data-random graphs were introduced recently. In this approach, a random directed graph is constructed from the data using the relative positions of the data points from various classes. Different random graphs result from different definitions of the proximity region associated with each data point and different graph statistics can be employe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1705.07600 شماره
صفحات -
تاریخ انتشار 2017